Skip to content

LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723

Merged
bb-sycl merged 3236 commits intosyclfrom
llvmspirv_pulldown
Apr 17, 2026
Merged

LLVM and SPIRV-LLVM-Translator pulldown (WW15 2026)#21723
bb-sycl merged 3236 commits intosyclfrom
llvmspirv_pulldown

Conversation

@iclsrc
Copy link
Copy Markdown
Collaborator

@iclsrc iclsrc commented Apr 10, 2026

tbaederr and others added 30 commits March 28, 2026 05:22
…aries (#189044)

We only did this for local variables but were were missing it for
globals.
…ardOperands API to BranchOpInterface (#187864)

To simplify the output of the reduction-tree pass, this PR introduces
the eraseRedundantBlocksInRegion. For regions containing multiple
execution paths, this functionality selects the shortest 'interesting'
path. Additionally, this PR adds the getSuccessorForwardOperands API to
BranchOpInterface. This allows us to extract the ForwardOperands for a
specific path chosen from multiple alternatives, enabling the creation
of a cf.br operation for the redirected jump.
…on index (#188508)

When a dynamic index of -1 (the kPoisonIndex sentinel) was folded into
the static position of a vector.insert op,
foldDenseElementsAttrDestInsertOp would proceed to call
calculateInsertPosition, which returned -1. The subsequent iterator
arithmetic (allValues.begin() + (-1)) was undefined behaviour, causing
an assertion in DenseElementsAttr::get.

Fix by bailing out early in foldDenseElementsAttrDestInsertOp when any
static position equals kPoisonIndex, consistent with how
InsertChainFullyInitialized already guards this case.

Fixes #188404

Assisted-by: Claude Code
…nt (#189163)

When invoking `-test-bytecode-roundtrip=test-dialect-version=X.Y` on a
module that contains no test dialect operations, the reader type
callback in `runTest0` called
`reader.getDialectVersion<test::TestDialect>()` and then immediately
asserted that it succeeded. However, if the test dialect was never
referenced in the bytecode (because no test dialect types appear in the
module), the dialect's version information is not stored in the
bytecode, so `getDialectVersion` legitimately returns failure.

When the test dialect version is unavailable in the bytecode being read,
the module contains no test dialect types, so no "funky"-group overrides
are needed and the callback can safely skip by returning `success()`.

A regression test is added with a module that has no test dialect ops,
exercising the `test-dialect-version=2.0` path that previously crashed.

Fixes #128321
Fixes #128325

Assisted-by: Claude Code
… (#188064)

This PR adds two new field specifiers (`operand` and `attribute`) and
extends the existing one (`result`):
- `default_factory` parameter is added for `result` and `attribute` to
specify default value via a lambda/function
- `kw_only` parameter is added for all these three specifiers, to make a
field a keyword-only parameter (without giving a default value).

```python
def result(
    *,
    infer_type: bool = False,
    default_factory: Optional[Callable[[], Any]] = None,
    kw_only: bool = False,
) -> Any: ...


def operand(
    *,
    kw_only: bool = False,
) -> Any: ...


def attribute(
    *,
    default_factory: Optional[Callable[[], Any]] = None,
    kw_only: bool = False,
) -> Any: ...
```

Examples about how to use them:
```python
class OperandSpecifierOp(TestFieldSpecifiers.Operation, name="operand_specifier"):
    a: Operand[IntegerType[32]] = operand()
    b: Optional[Operand[IntegerType[32]]] = None
    c: Operand[IntegerType[32]] = operand(kw_only=True)

class ResultSpecifierOp(TestFieldSpecifiers.Operation, name="result_specifier"):
    a: Result[IntegerType[32]] = result()
    b: Result[IntegerType[16]] = result(infer_type=True)
    c: Result[IntegerType] = result(
        default_factory=lambda: IntegerType.get_signless(8)
    )
    d: Sequence[Result[IntegerType]] = result(default_factory=list)
    e: Result[IntegerType[32]] = result(kw_only=True)

class AttributeSpecifierOp(
    TestFieldSpecifiers.Operation, name="attribute_specifier"
):
    a: IntegerAttr = attribute()
    b: IntegerAttr = attribute(
        default_factory=lambda: IntegerAttr.get(IntegerType.get_signless(32), 42)
    )
    c: StringAttr["a"] | StringAttr["b"] = attribute(
        default_factory=lambda: StringAttr.get("a")
    )
    d: IntegerAttr = attribute(kw_only=True)
```

---------

Co-authored-by: Rolf Morel <rolfmorel@gmail.com>
Summary:
These were renamed and the aliases removed, fix running the tests.
Signed-off-by: Shikhar Soni <shikharish05@gmail.com>
…89128)

This fixes #186684.

Also fix (not) breaking variables declared on the same line as the
closing brace.

And adapt whitesmith to that changes.
… broadcast from sg-to-wi (#185960)

This PR adds distribution patterns for vector.step, vector.shape_cast &
vector.broadcast in the new sg-to-wi pass
…. (#188721)

If a load and a store have different address spaces, we cannot create a
runtime check. Instead, always copy the data to an alloca matching the
store address space.

Fixes llvm/llvm-project#185236.

PR: llvm/llvm-project#188721
Need to check if the potential bitcast/bswap-like construct is a root of
the reduction, otherwise it cannot represent a bitcast/bswap construct.

Fixes #189184
FormatTest.cpp is too huge, extract some tests to mitigate this a bit.
#184545 default-enables the IO sandbox in assert-builds. This causes
Clang using Polly to crash (#188568).

The issue is that `PassBuilder` uses `vfs::getRealFileSystem()` by
default which is considered a IO sandbox violation in the Clang process.
With this PR store the VFS from the `PassBuilder` from the original
`registerPollyPasses` call for creating other `PassBuilder` instances.

This PR also adds infrastructure for running Polly in `clang` (in
addition in `opt`). `opt` does not enable the sandbox such that we need
separate tests using Clang.

Closes: #188568
On musl, rlimit64 is an alias for rlimit rather than a distinct type
provided by glibc. Add a SANITIZER_MUSL elif branch so that
struct_rlimit64_sz is defined for musl-based Linux targets.
…189199)

Use start + (end - start) / 2 instead of (start + end) / 2 to compute
the midpoint address. The original expression overflows when start + end
exceeds UPTR_MAX, which happens on 32-bit targets whose memory layout
includes regions above 0x80000000.
As a new contributor, it helps to correctly see the right maintainer.
…C (#189214)

ICF's InputSection::replace() calls markDead() on folded sections, so
`!sec->isLive()` already filters them.
@wenju-he
Copy link
Copy Markdown
Contributor

[libclc] Fix native cpu build

Reverted in 54aa435. The missing symbols are implemented in libdevice, e.g.

__spirv_ControlBarrier(int32_t Execution, int32_t Memory,

I have skipped native_cpu check-libclc in d93a810. This aligns with sycl branch.

[[libclc] Align AddLibclc.cmake with upstream LLVM]
(4465643)

LGTM. just added a minor code formatting in 6e622e5 to align with https://github.com/intel-restricted/applications.compilers.llvm-project/blob/ef83a191161833ae6a631d2a64630a88003e7ac0/libclc/CMakeLists.txt#L597-L601

@bader
Copy link
Copy Markdown
Contributor

bader commented Apr 13, 2026

native_cpu is not tested in sycl branch

@intel/dpcpp-nativecpu-reviewers, there is a gap in testing native_cpu. Please, improve testing to cover libclc.

jsji and others added 11 commits April 13, 2026 21:06
XFAIL first to unblock pulldown.
…682bd (#35866)

Fix unit test failure in StencilTest.DescribeImplicitOperator caused by
commit f4682bd which introduced staged lambda initialization.

In SYCL/CUDA/OpenMP builds, getManglingNumber() is called between
setLambdaContextDecl() and setLambdaNumbering(), which can trigger
linkage computation for init-capture variables while the lambda is only
partially initialized. This causes the cached linkage to differ from the
linkage computed after full initialization, triggering the assertion in
DeduceVariableDeclarationType.

Solution: Invalidate cached linkage after deducing the type. This
ensures
linkage is recomputed with the complete type.

Fixes regressions due to:

* 6df388f 2026-03-13 [sycl-web] Reland
8ce2b9c zahira.ammarguellat@intel.com
* f4682bd 2026-03-13
[Clang][ItaniumMangle] Fix recursive mangling for lambda init-captures
(#182667) ototot@google.com

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix the sycl-oneapi-gpu-amdgpu.cpp test that broke after commit
2757b58 which merged changes from 'main' to 'sycl-web'.

The merge introduced ToolChain::normalizeOffloadTriple() which
automatically completes incomplete triples (e.g., "amdgcn" becomes
"amdgcn-amd-amdhsa"). However, the SYCL driver should reject
incomplete triples when directly specified by the user.

Updated Driver.cpp:1389 to avoid normalizing user-provided triples
for SYCL.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix SYCLNativeCPUUtils vecz transform to handle the upstream change in
commit 3604119 that disallows
calling getTerminator() on blocks without terminators.

The rewireDivergentLoopExitBlocks function creates new basic blocks
(newDivergentLE) without terminators, but later operations like
DT->recalculate(), isReachable(), and predecessors() traverse the CFG
and call getTerminator(), which now asserts on blocks without terminators.

Applied the same workaround pattern used in commit 873322287a4312a7
(VPO/Paropt fix): temporarily add UnreachableInst terminators to newly
created blocks before CFG traversal operations, then remove and replace
them with proper terminators in computeNewTargets.

Changes:
- Add temporary UnreachableInst to newDivergentLE after creation
- Use getTerminatorOrNull() instead of getTerminator() where blocks
  may temporarily lack terminators
- Skip blocks without terminators in isReachable() CFG traversal
- Handle placeholder UnreachableInst terminators in computeNewTargets

Fixes: SYCL :: check_device_code/native_cpu/vectorization.cpp

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…lectives

The unsigned variants (uchar, ushort, uint, ulong) of __clc__get_group_scratch
functions were declared in collectives.cl but never defined in
collectives_helpers.cl for both PTX and AMD targets. This caused linker errors
during device code compilation when group broadcast operations used unsigned types.

Add implementations that cast from the corresponding signed type's scratch memory.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@againull againull marked this pull request as ready for review April 15, 2026 21:25
@againull againull requested review from a team and bader as code owners April 15, 2026 21:25
@jsji
Copy link
Copy Markdown
Contributor

jsji commented Apr 15, 2026

@wenju-he
Copy link
Copy Markdown
Contributor

@jsji
Copy link
Copy Markdown
Contributor

jsji commented Apr 17, 2026

@intel/llvm-gatekeepers This is ready for merge. Please help to issue /merge. Thanks.

@KornevNikita
Copy link
Copy Markdown
Contributor

/merge

@bb-sycl
Copy link
Copy Markdown
Contributor

bb-sycl commented Apr 17, 2026

Fri 17 Apr 2026 01:10:56 PM UTC --- Start to merge the commit into sycl branch. It will take several minutes.

@bb-sycl
Copy link
Copy Markdown
Contributor

bb-sycl commented Apr 17, 2026

Fri 17 Apr 2026 01:24:38 PM UTC --- Merge the branch in this PR to base automatically. Will close the PR later.

@bb-sycl bb-sycl merged commit 36595c8 into sycl Apr 17, 2026
76 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

disable-lint Skip linter check step and proceed with build jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.